Finding titles representing segments of Wikipedia Articles from keyphrases
نویسندگان
چکیده
Wikipedia is a free online encyclopedia that aims to allow anyone to edit any article or create them. However, articles tend to become long and complex, so giving appropriate titles or key phrases to untitled segments is necessary for reader assistance. In this paper, we show methods to select titles for representing article segments. Key phrase extraction has been studied for years, but we concentrate on selecting a title phrase for a given target segment from candidate phrases, which needs to reflect local and global contexts. In this paper, we evaluate five features we proposed before, and one new feature which is based on word embedding. These features are combined to produce a ranked list of candidate titles. We construct over a candidate title set consisting of titles of articles, sections and subsections, and anchor texts of inner links (inter-article links) where the hidden title of the target segment is the ground truth. We compare performance of various feature combinations by precision@K, reciprocal rank and average precision. Keyword: Wikipedia, Finding Representative Title, Candidate Ranking
منابع مشابه
Cross-language Entity Linking Adapting to User’s Language Ability
In this paper, we propose a method to automatically discover valuable keyphrases in Japanese and link these keyphrases to related Chinese Wikipedia pages. The method that we propose has four stages. Firstly, we extract nouns from a Japanese document using a morphological analyzer and extract the candidates of keyphrases using a method called Top Consecutive Nouns Cohesion (TCNC) [1]. Then, we j...
متن کاملEstimating Reference Scopes of Wikipedia Article Inner-links
Wikipedia is the largest online encyclopedia, and utilized as machine-knowledgeable and semantic resources. Links within Wikipedia indicate that two articles or parts of them related about their topics. Existing link detection methods focus on article titles because most of links in Wikipedia point to article titles. But there are a number of links in Wikipedia pointing to corresponding segment...
متن کاملExtracting Multilingual Dictionaries for the Teaching
This paper describes a method for creating multilingual dictionaries using Wikipedia as a resource. A lucky strike on the road to multilingual information retrieval, the main idea is simple: taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages produces a multilingual dictionary in all those languages. While the page content...
متن کاملA Generalized Method for Word Sense Disambiguation Based on Wikipedia
In this paper we propose a general framework for word sense disambiguation using knowledge latent in Wikipedia. Specifically, we exploit the rich and growing Wikipedia corpus in order to achieve a large and robust knowledge repository consisting of keyphrases and their associated candidate topics. Keyphrases are mainly derived from Wikipedia article titles and anchor texts associated with wikil...
متن کاملContext-Aware In-Page Search
In this paper we introduce a method for searching appropriate articles from knowledge bases (e.g. Wikipedia) for a given query and its context. In our approach, this problem is transformed into a multi-class classification of candidate articles. The method involves automatically augmenting smaller knowledge bases using larger ones and learning to choose adequate articles based on hyperlink simi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017